Goto

Collaborating Authors

 reproducibility crisis


Reproducibility: The New Frontier in AI Governance

Mason-Williams, Israel, Mason-Williams, Gabryel

arXiv.org Artificial Intelligence

AI policymakers are responsible for delivering effective governance mechanisms that can provide safe, aligned and trustworthy AI development. However, the information environment offered to policymakers is characterised by an unnecessarily low Signal-To-Noise Ratio, favouring regulatory capture and creating deep uncertainty and divides on which risks should be prioritised from a governance perspective. We posit that the current publication speeds in AI combined with the lack of strong scientific standards, via weak reproducibility protocols, effectively erodes the power of policymakers to enact meaningful policy and governance protocols. Our paper outlines how AI research could adopt stricter reproducibility guidelines to assist governance endeavours and improve consensus on the AI risk landscape. We evaluate the forthcoming reproducibility crisis within AI research through the lens of crises in other scientific domains; providing a commentary on how adopting preregistration, increased statistical power and negative result publication reproducibility protocols can enable effective AI governance. While we maintain that AI governance must be reactive due to AI's significant societal implications we argue that policymakers and governments must consider reproducibility protocols as a core tool in the governance arsenal and demand higher standards for AI research. Code to replicate data and figures: https://github.com/IFMW01/reproducibility-the-new-frontier-in-ai-governance


Confronting the Reproducibility Crisis: A Case Study in Validating Certified Robustness

Moulton, Richard H., McCully, Gary A., Hastings, John D.

arXiv.org Artificial Intelligence

Reproducibility is a cornerstone of scientific research, enabling validation, extension, and progress. However, the rapidly evolving nature of software and dependencies poses significant challenges to reproducing research results, particularly in fields like adversarial robustness for deep neural networks, where complex codebases and specialized toolkits are utilized. This paper presents a case study of attempting to validate the results on certified adversarial robustness in "SoK: Certified Robustness for Deep Neural Networks" using the VeriGauge toolkit. Despite following the documented methodology, numerous software and hardware compatibility issues were encountered, including outdated or unavailable dependencies, version conflicts, and driver incompatibilities. While a subset of the original results could be run, key findings related to the empirical robust accuracy of various verification methods proved elusive due to these technical obstacles, as well as slight discrepancies in the test results. This practical experience sheds light on the reproducibility crisis afflicting adversarial robustness research, where a lack of reproducibility threatens scientific integrity and hinders progress. The paper discusses the broader implications of this crisis, proposing potential solutions such as containerization, software preservation, and comprehensive documentation practices. Furthermore, it highlights the need for collaboration and standardization efforts within the research community to develop robust frameworks for reproducible research. By addressing the reproducibility crisis head-on, this work aims to contribute to the ongoing discourse on scientific reproducibility and advocate for best practices that ensure the reliability and validity of research findings within not only adversarial robustness, but security and technology research as a whole.


Reproducibility in Machine Learning-Driven Research

Semmelrock, Harald, Kopeinik, Simone, Theiler, Dieter, Ross-Hellauer, Tony, Kowald, Dominik

arXiv.org Artificial Intelligence

Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.


Last Week in AI #180: Meta's troubled chat bot, AI in femtech, Science AI's reproducibility crises, and more!

#artificialintelligence

Did Meta not learn anything from Microsoft's infamous chatbot Tay? On August 5, Meta released BlenderBot 3, an AI chatbot, to users in the US. As Meta warned, BlenderBot indeed was "likely to make untrue or offensive statements": it described Mark Zuckerberg as "too creepy and manipulative" to a reporter from Insider and claimed Trump was still president and "always will be" to a Wall Street Journal reporter. Users can flag BlenderBot's inappropriate and offensive responses, and Meta claims it has reduced offensive responses by 90 percent. Our Take: Color me amused and not surprised.


Sloppy Use of Machine Learning Is Causing a 'Reproducibility Crisis' in Science

#artificialintelligence

Machine learning involves feeding an algorithm data from the past that tunes it to operate on future, unseen data.


Sloppy Use of Machine Learning is Causing a 'Reproducibility Crisis' in Science

WIRED

History shows civil wars to be among the messiest, most horrifying of human affairs. So Princeton professor Arvind Narayanan and his PhD student Sayash Kapoor got suspicious last year when they discovered a strand of political science research claiming to predict when a civil war will break out with more than 90 percent accuracy, thanks to artificial intelligence. A series of papers described astonishing results from using machine learning, the technique beloved by tech giants that underpins modern AI. Applying it to data such as a country's gross domestic product and unemployment rate was said to beat more conventional statistical methods at predicting the outbreak of civil war by almost 20 percentage points. Yet when the Princeton researchers looked more closely, many of the results turned out to be a mirage.


Could machine learning fuel a reproducibility crisis in science?

#artificialintelligence

A CT scan of a tumor in human lungs. Researchers are experimenting with AI algorithms that can spot early signs of the disease.Credit: K. H. Fung/SPL From biomedicine to political sciences, researchers increasingly use machine learning as a tool to make predictions on the basis of patterns in their data. But the claims in many such studies are likely to be overblown, according to a pair of researchers at Princeton University in New Jersey. They want to sound an alarm about what they call a "brewing reproducibility crisis" in machine-learning-based sciences. Machine learning is being sold as a tool that researchers can learn in a few hours and use by themselves -- and many follow that advice, says Sayash Kapoor, a machine-learning researcher at Princeton.


Could machine learning fuel a reproducibility crisis in science?

#artificialintelligence

A CT scan of a tumor in human lungs. Researchers are experimenting with AI algorithms that can spot early signs of the disease.Credit: K. H. Fung/SPL From biomedicine to political sciences, researchers increasingly use machine learning as a tool to make predictions on the basis of patterns in their data. But the claims in many such studies are likely to be overblown, according to a pair of researchers at Princeton University in New Jersey. They want to sound an alarm about what they call a "brewing reproducibility crisis" in machine-learning-based sciences. Machine learning is being sold as a tool that researchers can learn in a few hours and use by themselves -- and many follow that advice, says Sayash Kapoor, a machine-learning researcher at Princeton.


Leakage and the Reproducibility Crisis in ML-based Science

Kapoor, Sayash, Narayanan, Arvind

arXiv.org Artificial Intelligence

The use of machine learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there are many known methodological pitfalls, including data leakage, in ML-based science. In this paper, we systematically investigate reproducibility issues in ML-based science. We show that data leakage is indeed a widespread problem and has led to severe reproducibility failures. Specifically, through a survey of literature in research communities that adopted ML methods, we find 17 fields where errors have been found, collectively affecting 329 papers and in some cases leading to wildly overoptimistic conclusions. Based on our survey, we present a fine-grained taxonomy of 8 types of leakage that range from textbook errors to open research problems. We argue for fundamental methodological changes to ML-based science so that cases of leakage can be caught before publication. To that end, we propose model info sheets for reporting scientific claims based on ML models that would address all types of leakage identified in our survey. To investigate the impact of reproducibility errors and the efficacy of model info sheets, we undertake a reproducibility study in a field where complex ML models are believed to vastly outperform older statistical models such as Logistic Regression (LR): civil war prediction. We find that all papers claiming the superior performance of complex ML models compared to LR models fail to reproduce due to data leakage, and complex ML models don't perform substantively better than decades-old LR models. While none of these errors could have been caught by reading the papers, model info sheets would enable the detection of leakage in each case.


GitHub Is Bad for AI: Solving ML Reproducibility - DZone AI

#artificialintelligence

There is a crisis in machine learning that is preventing the field from progressing as fast as it could. It stems from a broader predicament surrounding reproducibility that impacts scientific research in general. A Nature survey of 1,500 scientists revealed that 70% of researchers have tried and failed to reproduce another scientist's experiments, and over 50% have failed to reproduce their own work. Reproducibility, also called replicability, is a core principle of the scientific method and helps ensure the results of a given study aren't a one-off occurrence but instead represent a replicable observation. In computer science, reproducibility has a more narrow definition: Any results should be documented by making all data and code available so that the computations can be executed again with the same results.